Research on Watermarking Attack Technology of Computer Vision Models

doi:10.19678/j.issn.1000-3428. 0252743

Abstract

Abstract: Model intellectual property protection has become an issue that cannot be ignored in model security. Watermarking technology, as the core means of model traceability, provides technical support for copyright verification by embedding special identifiers into model parameters or generated content. However, the trained watermarked models are very easy to be copied and spread, which enables attackers to destroy or remove the watermarks embedded in DNN models through specific technical means such as fine-tuning, pruning, or adversarial sample attacks, making it impossible to verify the model ownership. To gain a deeper understanding of model watermarking attack methods, this paper first introduces model watermarking attacks, then classifies the model watermarking attack methods into two categories: white-box watermarking attacks and black-box watermarking attacks, based on the attacker's access rights and information acquisition capabilities to the target model. It also sorts out and analyzes the motives, hazards, attack principles, and specific implementation methods of DNN model watermarking attacks. Meanwhile, it compares and summarizes the existing research on model watermarking attacks from the aspects of attacker capabilities and performance impacts. Finally, it further explores the potential positive role of neural network model watermarking attacks in future research and provides suggestions for in-depth research in the fields of model security and intellectual property protection.

摘要： 模型知识产权保护已成为模型安全中不可忽视的问题，水印技术作为模型溯源的核心手段，通过将特殊标识嵌入模型参数或生成内容中，为版权验证提供技术支撑。然而，训练完成的含水印模型非常容易被复制并扩散，这使得攻击者能够通过微调、剪枝或对抗样本攻击等特定技术手段，破坏或去除DNN模型中嵌入的水印，使得模型所有权无法验证。为了更深入地了解模型水印攻击方法，首先对模型水印攻击进行介绍，其次对模型水印攻击方法进行分类，根据攻击者对目标模型的访问权限和信息获取能力，分为白盒水印攻击和黑盒水印攻击两类，对DNN模型水印攻击的动因、危害、攻击原理和具体实施手段梳理和分析，同时对现有模型水印攻击研究从攻击者能力以及性能影响等方面进行比较与总结。最后，进一步探讨了神经网络模型水印攻击在未来研究中的潜在积极作用，为模型安全和知识产权保护领域的深入研究提供建议。

Wang Wen, Yang Kuiwu, Tong Songsong, Wei Jianghong, Xue Yan, Zhou Rongkui. Research on Watermarking Attack Technology of Computer Vision Models[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428. 0252743.

王雯, 杨奎武, 仝松松, 魏江宏, 薛岩, 周荣魁. 基于深度神经网络的模型水印攻击技术研究[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428. 0252743.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428. 0252743

References

[1] Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.
[2] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 2019: 4171-4186.
[3] Uchida Y, Nagai Y, Sakazawa S, et al. Embedding Watermarks into Deep Neural Networks[C]//Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. Bucharest Romania: ACM, 2017: 269-277.
[4] Wang T, Kerschbaum F. Attacks on Digital Watermarks for Deep Neural Networks[C]//ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Brighton, United Kingdom: IEEE, 2019: 2622-2626.
[5] 何春辉, 葛斌, 徐浩, 等. 版权保护视角下的图像水印处理技术综述[J]. 计算机技术与发展,2025,35(08):1-9. HE Chunhui, GE Bin, XU Hao, et al. Review of Image Watermarking Processing Technology from the Perspective of Copyright Protection [J]. Computer Technology and Development, 2025, 35(08): 1-9.
[6] 谢宸琪, 张保稳, 易平. 人工智能模型水印研究综述[J]. 计算机科学,2021,48(07):9-16. XIE Chenqi, ZHANG Baowen, YI Ping. Review of Watermarking for Artificial Intelligence Models [J]. Computer Science, 2021, 48(07): 9-16.
[7] 夏道勋, 王林娜, 宋允飞, 等. 深度神经网络模型数字水印技术研究进展综述[J]. 科学技术与工程,2023,23(5):1799-1811. Xia XIA Daoxun, WANG Linna, SONG Yunfei, et al. Review of deep neural network model digital watermarking technology[J]. Science Technology and Engineering, 2023, 23(5): 1799-1811.
[8] 吴汉舟, 张杰, 李越, 等. 人工智能模型水印研究进展[J]. 中国图象图形学报,2023,28(06):1792-1810. WU Hanzhou, ZHANG Jie, LI Yue, et al. Research Progress on Watermarking of Artificial Intelligence Models [J]. Journal of Image and Graphics, 2023, 28(06): 1792-1810.
[9] 金彪, 林翔, 熊金波, 等. 基于水印技术的深度神经网络模型知识产权保护[J]. 计算机研究与发展, 2024, 61(10): 2587-2606. JIN Biao, LIN Xiang, XIONG Jinbo, et al. Intellectual Property Protection of Deep Neural Network Models Based onWatermarking Technology [J]. Journal of Computer Research and Development, 2024, 61(10): 2587-2606.
[10] Zhang X, Tang Z, Xu Z, et al. OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking[C]//Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR). 2025: 3008-3018.
[11] Chen X, Wang W, Bender C, et al. REFIT: A Unified Watermark Removal Framework For Deep Learning Systems With Limited Data[C]//Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security. 2021: 321-335.
[12] Kirkpatrick J, Pascanu R, Rabinowitz N, et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences, 2017, 114(13): 3521-3526.
[13] Miyato T, Maeda S I, Koyama M, et al. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1979-1993.
[14] Grandvalet Y, Bengio Y. Semi-supervised learning by entropy minimization[C]//Proceedings of the 18th International Conference on Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2004: 529-536.
[15] Goodfellow I J, Mirza M, Xiao D, et al. An empirical investigation of catastrophic forgetting in gradient-based neural networks[J]. arxiv preprint arxiv:1312.6211, 2013.
[16] Kemker R, McClure M, Abitino A, et al. Measuring catastrophic forgetting in neural networks[C]//Proceedings of the AAAI conference on artificial intelligence. 2018, 32(1).
[17] Coop R, Mishtal A, Arel I. Ensemble Learning in Fixed Expansion Layer Networks for Mitigating Catastrophic Forgetting[J]. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(10): 1623-1634.
[18] Adi Y, Baum C, Cisse M, et al. Turning your weakness into a strength: Watermarking deep neural networks by backdooring[C]//27th USENIX security symposium (USENIX Security 18). 2018: 1615-1631.
[19] Liu K, Dolan-Gavitt B, Garg S. Fine-pruning: Defending against backdooring attacks on deep neural networks[C]//International symposium on research in attacks, intrusions, and defenses. Cham: Springer International Publishing, 2018: 273-294.
[20] Zhang J, Gu Z, Jang J, et al. Protecting Intellectual Property of Deep Neural Networks with Watermarking[C]//Proceedings of the 2018 on Asia Conference on Computer and Communications Security. Incheon Republic of Korea: ACM, 2018: 159-172.
[21] Guo S, Zhang T, Qiu H, et al. Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal Attack for DNN Models[C]//Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence. Montreal, Canada: International Joint Conferences on Artificial Intelligence Organization, 2021: 3635-3641.
[22] Darvish Rouhani B, Chen H, Koushanfar F. DeepSigns: An End-to-End Watermarking Framework for Ownership Protection of Deep Neural Networks[C]//Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. Providence RI USA: ACM, 2019: 485-497.
[23] 李珮玄, 黄土, 罗书卿, 等. 深度学习模型版权保护技术研究综述[J]. 信息安全学报, 2025, 10(01): 17-35. LI Peixuan, HUANG Tu, LUO Shuqing, et al. A Review of Copyright Protection Technology for Deep Learning Models [J]. Journal of Information Security, 2025, 10(01): 17-35.
[24] Krogh A, Hertz J A. A simple weight decay can improve generalization[C]//Proceedings of the 5th International Conference on Neural Information Processing Systems. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1991: 950-957.
[25] Guan X, Feng H, Zhang W, et al. Reversible Watermarking in Deep Convolutional Neural Networks for Integrity Authentication[C]//Proceedings of the 28th ACM International Conference on Multimedia. 2020: 2273-2280.
[26] Zhao G, Qin C, Yao H, et al. DNN self-embedding watermarking: Towards tampering detection and parameter recovery for deep neural network[J]. Pattern Recognition Letters, 2022, 164: 16-22.
[27] Yan Y, Pan X, Zhang M, et al. Rethinking {White-Box} watermarks on deep learning models under neural structural obfuscation[C]//32nd USENIX Security Symposium (USENIX Security 23). 2023: 2347-2364.
[28] Yang Z, Dang H, Chang E C. Effectiveness of distillation attack and countermeasure on neural network watermarking[J]. arXiv preprint arXiv:1906.06046, 2019.
[29] Wang B, Yao Y, Shan S, et al. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks[C]//2019 IEEE Symposium on Security and Privacy (SP). San Francisco, CA, USA: IEEE, 2019: 707-723.
[30] Chen X, Liu C, Li B, et al. Targeted backdoor attacks on deep learning systems using data poisoning[J]. arxiv preprint arxiv:1712.05526, 2017.
[31] Liu Y, Ma S, Aafer Y, et al. Trojaning Attack on Neural Networks. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 27–41.
[32] Gu T, Dolan-Gavitt B, Garg S. Badnets: Identifying vulnerabilities in the machine learning model supply chain[J]. arXiv preprint arXiv:1708.06733, 2017.
[33] Chen X, Wang W, Ding Y, et al. Leveraging unlabeled data for watermark removal of deep neural networks[C]//ICML workshop on Security and Privacy of Machine Learning. 2019: 1-6.
[34] Shafieinejad M, Lukas N, Wang J, et al. On the robustness of backdoor-based watermarking in deep neural networks[C]//Proceedings of the 2021 ACM workshop on information hiding and multimedia security. 2021: 177-188.
[35] Chen H, Rouhani B D, Koushanfar F. Blackmarks: Blackbox multibit watermarking for deep neural networks. arxiv 2019[J]. arxiv preprint arxiv:1904.00344.
[36] Bansal A, Chiang P yeh, Curry M J, et al. Certified neural network watermarks with randomized smoothing[C]//International Conference on Machine Learning. PMLR, 2022: 1450-1465.
[37] Liu X, Li F, Wen B, et al. Removing backdoor-based watermarks in neural networks with limited data[C]//2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021: 10149-10156.
[38] Puah Y H, Ngo A T, Chattopadhyay N, et al. BlockDoor: Blocking Backdoor Based Watermarks in Deep Neural Networks[J]. arxiv preprint arxiv:2412.12194, 2024.
[39] Aiken W, Kim H, Woo S, et al. Neural network laundering: Removing black-box backdoor watermarks from deep neural networks[J]. Computers & Security, 2021, 106: 102277.
[40] Hitaj D, Mancini L V. Have you stolen my model? evasion attacks against deep neural network watermarking techniques[J]. arxiv preprint arxiv:1809.00615, 2018.
[41] 姜妍, 张立国. 面向深度学习模型的对抗攻击与防御方法综述[J]. 计算机工程, 2021, 47(1): 1-11. JIANG Yan, ZHANG Liguo. Survey of Adversarial Attacks and Defense Methods for Deep Learning Model[J]. ComputerEngineering, 2021, 47(1): 1-11.
[42] 刘帅威, 李智, 王国美, 等. 基于Transformer 和 GAN的对抗样本生成算法[J]. 计算机工程, 2024, 50(2): 180-187. LIU Shuaiwei, LI Zhi, WANG Guomei, et al. Adversarial Example Generation Algorithm Based on Transformer and GAN[J]. Computer Engineering, 2024, 50(2): 180-187.
[43] An B, Ding M, Rabbani T, et al. WAVES: Benchmarking the robustness of image watermarks. In Proceedings of the 41st International Conference on Machine Learning (ICML’24), Vol. 235. 1456–1492.
[44] Lin J, Juarez M. A Crack in the Bark: Leveraging Public Knowledge to Remove Tree-Ring Watermarks[J]. arxiv preprint arxiv:2506.10502, 2025.
[45] Wen Y, Kirchenbauer J, Geiping J, et al. Tree-rings watermarks: Invisible fingerprints for diffusion images[J]. Advances in Neural Information Processing Systems, 2023, 36: 58047-58063.
[46] Quiring E, Rieck K. Adversarial Machine Learning Against Digital Watermarking[C]//2018 26th European Signal Processing Conference (EUSIPCO). Rome: IEEE, 2018: 519-523.
[47] Quiring E, Arp D, Rieck K. Forgotten Siblings: Unifying Attacks on Machine Learning and Digital Watermarking[C]//2018 IEEE European Symposium on Security and Privacy (EuroS&P). London: IEEE, 2018: 488-502.
[48] Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks[J]. arxiv preprint arxiv:1312.6199, 2013.
[49] Cheng D, Li X, Li W H, et al. Large-scale visible watermark detection and removal with deep convolutional networks[C]//Pattern Recognition and Computer Vision: First Chinese Conference, PRCV 2018, Guangzhou, China, November 23-26, 2018, Proceedings, Part III 1. Springer, 2018: 27-40.
[50] Li X, Lu C, Cheng D, et al. Towards photo-realistic visible watermark removal with conditional generative adversarial networks[C]//Image and Graphics: 10th International Conference, ICIG 2019, Beijing, China, August 23–25, 2019, Proceedings, Part I 10. Springer International Publishing, 2019: 345-356.
[51] Cao Z, Niu S, Zhang J, et al. Generative adversarial networks model for visible watermark removal[J]. IET Image Processing, 2019, 13(10): 1783-1789.
[52] Lukas N, Diaa A, Fenaux L, et al. Leveraging optimization for adaptive attacks on image watermarks[J]. arxiv preprint arxiv:2309.16952, 2023.
[53] Li X. DiffWA: Diffusion Models for Watermark Attack [C]//2023 International Conference on Integrated Intelligence and Communication Systems (ICIICS). IEEE, 2023: 1-8.
[54] Zhao X, Zhang K, Su Z, et al. Invisible image watermarks are provably removable using generative ai[J]. Advances in neural information processing systems, 2024, 37: 8643-8672.
[55] Hu Y, Jiang Z, Guo M, et al. Stable Signature is Unstable: Removing Image Watermark from Diffusion Models. arxiv 2024, arxiv:2405.07145.
[56] Jiang Z, Zhang J, Gong N Z. Evading watermark based detection of ai-generated content[C]//Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security. 2023: 1168-1181.
[57] Chen J, Jordan M I, Wainwright M J. Hopskipjumpattack: A query-efficient decision-based attack[C]//2020 ieee symposium on security and privacy (sp). IEEE, 2020: 1277-1294.
[58] Hu Y, Jiang Z, Guo M, et al. A transfer attack to image watermarks[J]. arxiv preprint arxiv:2403.15365, 2024.
[59] Zhu J, Kaplan R, Johnson J, et al. Hidden: Hiding data with deep networks[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 657-672.
[60] Tancik M, Mildenhall B, Ng R. Stegastamp: Invisible hyperlinks in physical photographs[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 2117-2126.
[61] Liu Y, Song Y, Ci H, et al. Invisible image watermarks are provably removable using generative AI. In Advances in Neural Information Proccessing Systems (NeurIPS), 2024.
[62] Saberi M, Sadasivan V S, Rezaei K, et al. Robustness of ai-image detectors: Fundamental limits and practical attacks. In The Twelfth International Conference on Learning Representations, 2023.
[63] 汪旭童, 尹捷, 刘潮歌, 等. 神经网络后门攻击与防御综述[J]. 计算机学报, 2024, 47(08): 1713-1743. WANG Xutong, YIN Jie, LIU Chaoge, et al. Review of Neural Network Backdoor Attacks and Defenses [J]. Journal of Computer Science, 2024, 47(08): 1713-1743.
[64] Fan Z, Guan Y peng. A deep learning framework for face verification without alignment[J]. Journal of Real-Time Image Processing, 2021, 18(4): 999-1009.
[65] Huang Y, Pan L, Luo W, et al. Machine learning-based online source identification for image forensics[M]//Cyber Security Meets Machine Learning. Springer, 2021: 27-56.
[66] Rombach R, Blattmann A, Lorenz D, et al. High-resolution image synthesis with latent diffusion models[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 10684-10695.
[67] Xue Z, Marculescu R. Dynamic multimodal fusion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 2575-2584.
[68] Zhang Q, Wu H, Zhang C, et al. Provable dynamic fusion for low-quality multimodal data[C]//International conference on machine learning. PMLR, 2023: 41753-41769.
[69] Liu A, Pan L, Lu Y, et al. A survey of text watermarking in the era of large language models[J]. ACM Computing Surveys, 2024, 57(2): 1-36.
[70] Wang B, Wu Y, Wang G. Adaptor: Improving the robustness and imperceptibility of watermarking by the adaptive strength factor[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(11): 6260-6272.
[71] Zhang L, Liu X, Martin A V, et al. Attack-resilient image watermarking using stable diffusion[J]. Advances in Neural Information Processing Systems, 2024, 37: 38480-38507.
[72] Wen Y, Kirchenbauer J, Geiping J, et al. Tree-rings watermarks: Invisible fingerprints for diffusion images[J]. Advances in Neural Information Processing Systems, 2023, 36: 58047-58063.
[73] Aggarwal A, Mittal M, Battineni G. Generative adversarial network: An overview of theory and applications[J]. International Journal of Information Management Data Insights, 2021, 1(1): 100004.
[74] Zhai G, Min X. Perceptual image quality assessment: a survey[J]. Science China Information Sciences, 2020, 63(11): 211301.[75] Hancock J T, Khoshgoftaar T M, Johnson J M. Evaluating classifier performance with highly imbalanced big data[J]. Journal of Big Data, 2023, 10(1): 42.
[76] Betzalel E, Penso C, Fetaya E. Evaluation metrics for generative models: An empirical study[J]. Machine Learning and Knowledge Extraction, 2024, 6(3): 1531-1544.
[77] Fei J, Xia Z, Tondi B, et al. Wide flat minimum watermarking for robust ownership verification of gans. IEEE Transactions on Information Forensics and Security, 2024.
[78] Ogundokun R O, Abikoye C O, Kumar Sahu A, et al. Enhancing security and ownership protection of neural networks using watermarking techniques: A systematic literature review using prisma[J]. Multimedia Watermarking, 2024: 1-28.
[79] Tramèr F, Zhang F, Juels A, et al. Stealing Machine Learning Models via Prediction APIs[C]//25th USENIX Security Symposium (USENIX Security 16). Austin, TX: USENIX Association, 2016: 601-618.
[80] Mo M, Wang C, Guo Q, et al. A novel robust black-box fingerprinting scheme for deep classification neural networks[J]. Expert Systems with Applications, 2024, 252: 124201.
[81] Zong W, Chow Y W, Susilo W, et al. Ipremover: A generative model inversion attack against deep neural network fingerprinting and watermarking[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2024, 38(7): 7837-7845.
[82] Fernandez P, Couairon G, Jégou H, et al. The stable signature: Rooting watermarks in latent diffusion models[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 22466-22477.
[83] Zhang G, Wang L, Su Y, et al. MarkPlugger: Generalizable Watermark Framework for Latent Diffusion Models without Retraining[J]. IEEE Transactions on Multimedia, 2025: 1-9.

Please choose a citation manager

Content to export